Batch ContentSummaryLog queries in _consolidate_courses_data#14444
Conversation
|
Hi @devswithme - could you open an issue for this so we can discuss further before jumping to a solution? I agree with the assessment that there is some potential for optimization here, but I'd rather plan it out a bit more carefully. |
|
Yes can, Thank you @rtibbles. |
|
📢✨ Before we assign a reviewer, we'll turn on |
rtibblesbot
left a comment
There was a problem hiding this comment.
Clean optimization with correct semantics. CI passing. No UI files changed.
- suggestion: The cross-course batching has no multi-course progress test — see inline.
- praise: Empty-set guard and deduplication — see inline.
@rtibblesbot's comments are generated by an LLM, and should be evaluated accordingly
How was this generated?
Reviewed the pull request diff checking for:
- Correctness: bugs, edge cases, undocumented behavior, resource leaks, hardcoded values
- Design: unnecessary complexity, naming, readability, comment accuracy, redundant state
- Architecture: duplicated concerns, minimal interfaces, composition over inheritance
- Testing: behavior-based assertions, mocks only at hard boundaries, accurate coverage
- Completeness: missing dependencies, unupdated usages, i18n, accessibility, security
- Principles: DRY (same reason to change), SRP, Rule of Three (no premature abstraction)
- Checked CI status and linked issue acceptance criteria
- For UI changes: inspected screenshots for layout, visual completeness, and consistency
| ) | ||
|
|
||
| total_content = content_qs.count() | ||
| all_content_ids = {cid for ids in course_content_map.values() for cid in ids} |
There was a problem hiding this comment.
✅ Resolved — addressed in the current code.
suggestion: all_content_ids merges content across all N courses, but the test suite only has a single-course progress test (test_learner_course_progress_calculated_correctly). The multi-course batching path — where two courses share the same completed_ids set and each gets its own per-course intersection — has no direct test. Consider a test with 2 courses having different completion states to confirm the batched lookup produces the same per-course progress values as the old per-course queries did.
| total_content = content_qs.count() | ||
| all_content_ids = {cid for ids in course_content_map.values() for cid in ids} | ||
| completed_ids = set() | ||
| if all_content_ids: |
There was a problem hiding this comment.
✅ Resolved — addressed in the current code.
praise: Good guard — skips the ContentSummaryLog query entirely when no courses have any content. Also worth noting: using set() on the query result means the new code correctly handles the edge case where ContentSummaryLog has multiple rows for the same (user, content_id) pair (the model has no unique constraint). The old .count() approach would have overcounted in that scenario; the new approach counts each logical completion exactly once.
…_data The learn home/course consolidation issued two queries per enrolled course: a descendant COUNT and a ContentSummaryLog COUNT for completed content (2N total). Gather each course's descendant content ids in a single pass and resolve completed content across all courses in one ContentSummaryLog query (N+1). Output is unchanged: total stays the descendant row count and completed stays the distinct count of matching summary logs (set intersection), matching the original filter(content_id__in=...).count() semantics. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2afebc7 to
b0532aa
Compare
rtibblesbot
left a comment
There was a problem hiding this comment.
0 prior findings open; all acknowledged — maintainer approved with the IN-clause SQLite variable-count tradeoff noted.
Prior-finding status
ACKNOWLEDGED — kolibri/plugins/learn/viewsets.py:162 — Correct handling of duplicate content_ids [praise]
ACKNOWLEDGED — kolibri/plugins/learn/viewsets.py:152 — IN clause grows with total enrolled content, not per-course [suggestion]
@rtibblesbot's comments are generated by an LLM, and should be evaluated accordingly
How was this generated?
Ran an automatic code-only delta review triggered by new commits on a previously reviewed PR:
- Retrieved prior bot reviews via the GitHub API
- Classified each prior finding as RESOLVED, UNADDRESSED, ACKNOWLEDGED, or CONTESTED
- Only raised NEW findings for newly introduced code
- Core review pass only — specialized frontend/backend lenses and manual QA run when a review is explicitly requested
- Synthesized one review from the passes and chose the verdict from the findings, CI status, and QA evidence
Build Artifacts
Smoke test screenshot |
Summary
Reduces database queries in the learn plugin at
_consolidate_courses_datafrom 2N to N+1, where N is the number of courses a learner is enrolled in. Previously, for each course node the function issued two separate queries: oneCOUNTfor non-topic descendants and oneCOUNTonContentSummaryLogfor completed content. On a learner's home page or course list with 10 enrolled courses, this produced 20 queries.References
Closes #14453.
Reviewer guidance
_consolidate_courses_datainkolibri/plugins/learn/viewsets.py.pytest kolibri/plugins/learn/test/test_learner_course.pyandpytest kolibri/plugins/learn/test/test_learner_classroom.pytoconfirm all existing tests pass.
ContentSummaryLogcontent_ids (set(descendant_ids) & completed),preserving the original
filter(content_id__in=...).count()semanticswhen a course contains the same content_id in more than one node.
AI usage
I used Claude to identify the N+1 query pattern and propose the batching approach and reviewed the generated code for correctness specifically verifying UUID type consistency between
ContentNodeandContentSummaryLog, and the empty-set guard, and confirmed the change produces identical output across all existing test cases before accepting it.